Chiefs WR Rashee Rice cooperating with authorities following reported crash, lawyer says https://www.nfl.com/news/chiefs-wr-rashee-rice-cooperating-with-authorities-following-reported-crash-lawyer-says
A Multi-agent Reinforcement Learning Study of Evolution of Communication and Teaching under Libertarian and Utilitarian Governing Systems
Aslan S. Dizaji
https://arxiv.org/abs/2403.02369
Corral Some Zippy Blue Flames Into 3D Printed Troughs
https://poliverso.org/display/0477a01e-237504d0-e7bf32edd4324f0e
Corral Some Zippy Blue Flames Into 3D Printed Troughs [Steve Mould] came across an interesting little phenomenon of blue flames zi…
Reversible single-pulse laser-induced phase change of Sb$_2$S$_3$ thin films: multi-physics modeling and experimental demonstrations
Capucine Laprais, Cl\'ement Zrounba, Julien Bouvier, Nicholas Blanchard, Matthieu Bugnet, Yael Guti\'errez, Saul Vazquez-Miranda, Shirly Espinoza, Peter Thiesen, Romain Bourrellier, Aziz Benamrouche, Nicolas Baboux, Guillaume Saint-Girons, Lotfi Berguiga, S\'ebastien Cueff
Military intelligence: Russia flies attack drones over occupied Zaporizhzhia plant, video shows: https://benborges.xyz/2024/05/02/military-intelligence-russia.html
Master equations with indefinite nonlinearities
Wenxiong Chen, Yahong Guo
https://arxiv.org/abs/2405.02091 https://arxiv.org/pdf/2405…
Multi-intent-aware Session-based Recommendation
Minjin Choi, Hye-young Kim, Hyunsouk Cho, Jongwuk Lee
https://arxiv.org/abs/2405.00986 https://
This https://arxiv.org/abs/2403.11893 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_qu…
Better & Faster Large Language Models via Multi-token Prediction
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozi\`ere, David Lopez-Paz, Gabriel Synnaeve
https://arxiv.org/abs/2404.19737 https://arxiv.org/pdf/2404.19737
arXiv:2404.19737v1 Announce Type: new
Abstract: Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample efficiency. More specifically, at each position in the training corpus, we ask the model to predict the following n tokens using n independent output heads, operating on top of a shared model trunk. Considering multi-token prediction as an auxiliary training task, we measure improved downstream capabilities with no overhead in training time for both code and natural language models. The method is increasingly useful for larger model sizes, and keeps its appeal when training for multiple epochs. Gains are especially pronounced on generative benchmarks like coding, where our models consistently outperform strong baselines by several percentage points. Our 13B parameter models solves 12 % more problems on HumanEval and 17 % more on MBPP than comparable next-token models. Experiments on small algorithmic tasks demonstrate that multi-token prediction is favorable for the development of induction heads and algorithmic reasoning capabilities. As an additional benefit, models trained with 4-token prediction are up to 3 times faster at inference, even with large batch sizes.
Masked Multi-Domain Network: Multi-Type and Multi-Scenario Conversion Rate Prediction with a Single Model
Wentao Ouyang, Xiuwu Zhang, Chaofeng Guo, Shukui Ren, Yupei Sui, Kun Zhang, Jinmei Luo, Yunfeng Chen, Dongbo Xu, Xiangzheng Liu, Yanlong Du
https://arxiv.org/abs/2403.17425